-
Notifications
You must be signed in to change notification settings - Fork 14k
Implement partial_sort_unstable for slice #149318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
7af04ad to
115ac5c
Compare
This comment has been minimized.
This comment has been minimized.
Signed-off-by: tison <wander4096@gmail.com>
115ac5c to
0e87d5d
Compare
Signed-off-by: tison <wander4096@gmail.com>
|
cc @orlp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some remarks, some on style, but also some with substance.
Besides comments on the code that's written, I do note a lack of tests?
| /// as if the whole slice were sorted in ascending order. | ||
| /// | ||
| /// This sort is unstable (i.e., may reorder equal elements), in-place (i.e., does not | ||
| /// allocate), and *O*(*n* + *k* \* log(*k*)) worst-case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not specified what k is.
| /// | ||
| /// If the implementation of [`Ord`] for `T` does not implement a [total order], the function | ||
| /// may panic; even if the function exits normally, the resulting order of elements in the slice | ||
| /// is unspecified. See also the note on panicking below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to repeat the entire comment with examples from sort_unstable_by here, I think we can just refer the reader to its comment. This concerns this paragraph as well as all following paragraphs up until # Panics.
| where | ||
| F: FnMut(&T, &T) -> Ordering, | ||
| R: RangeBounds<usize>, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation should be under sort::unstable::partial_sort instead, and like its cousin sort::unstable::sort take a function returning a boolean. Then partial_sort_unstable(_by(_key)) can all dispatch to that one directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then partial_sort_unstable(_by(_key)) can all dispatch to that one directly.
Not quite. I think sort_unstable and sort_unstable_by_key should instead redirect to sort_unstable_by and so does select_nth_unstable series, as what binary_search and partition_dedup do.
The final implement may be factored out to sort::unstable::partial_sort. But I'd prefer to align all these styles in a follow-up patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think that would be a mistake, and would reject that follow-up patch. The sorts are incredibly sensitive to inlining and I would highly prefer to minimize the number of steps of indirection to maximize the chance the comparison is inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Let me try to find a place for the common partial_sort impls :D
| /// Partially sorts the slice in ascending order **without** preserving the initial order of equal elements. | ||
| /// | ||
| /// Upon completion, the elements in the specified `range` will be the elements of the slice | ||
| /// as if the whole slice were sorted in ascending order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the guarantees I mentioned in my comment on the ACP about the elements that come before and after the sorted range.
| let Range { start, end } = slice::range(range, ..len); | ||
|
|
||
| if start + 1 > end { | ||
| // target range is empty, nothing to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect if we do provide the guarantees for elements before and after the range I mentioned in my comment on the ACP. I think those guarantees are more useful than allowing the partial sort by an empty range in the middle to be a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Let me try to figure out how to work it out. From what we have now, perhaps we'd select a on a..a for partitioning, and properly handle the case where the slice is empty to avoid panic.
| let len = self.len(); | ||
| let Range { start, end } = slice::range(range, ..len); | ||
|
|
||
| if start + 1 > end { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a very strange way to write start == end. slice::range already disallows inverted ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'd leave it for now and it's OK to change to start == end if it's more preferred. This is mainly personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it also costs two extra instructions compared to start == end: https://rust.godbolt.org/z/fe37Y8Pb3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your information! I'll update it in my next commit then.
| return; | ||
| } | ||
|
|
||
| let index = start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assigning start to a variable which is used only once and is later shadowed with a different definition is not a good idea. Just directly pass start to partition_at_index IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
| let (_, _, rest) = | ||
| sort::select::partition_at_index(self, index, |a, b| compare(a, b) == Less); | ||
|
|
||
| if start + 2 > end { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, very strange way to write end - start == 1. I think this branch is worthwhile though, select_nth_unstable does guarantee that this singular element is in its correct place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the style issue, I'm considering also what we'd specially handle whenstart == 0 so that a partitioning by end - 1 + sort can eliminate the first select_nth(0). But I'm not sure if the complexity handling more cases worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start == 0 should be the most common case. And a..a would be easier to handle if we distinguish 0..0 (nothing to do) and a..a (a > 0) (must have (a-1)-th element to be partitioned on).
Let me try to apply this idea.
Doc tests cover most branches. I don't find a dedicated file to cover its cousin |
|
The examples can change at any time. And you didn't test, for example, the post-condition that all elements |
Thanks and yes. Do you know where the unit tests of |
|
I believe the bulk is found in https://github.com/rust-lang/rust/blob/main/library/alloctests/tests/sort/tests.rs. |
|
What I suggested in the ACP was a sketch implementation, I did some more thinking and I think the following handles all corner cases nicely: pub fn partial_sort<T, F, R>(mut v: &mut [T], range: R, is_less: &mut F)
where
F: FnMut(&T, &T) -> bool,
R: RangeBounds<usize>,
{
let len = v.len();
let Range { start, end } = slice::range(range, ..len);
if end - start <= 1 {
// Can be resolved in at most a single partition_at_index call, without
// further sorting. Do nothing if it is an empty range at start or end.
if start != len && end != 0 {
sort::select::partition_at_index(v, start, is_less);
}
return;
}
// Don't bother reducing the slice to sort if it eliminates fewer than 8 elements.
if end + 8 <= len {
v = sort::select::partition_at_index(v, end - 1, is_less).0;
}
if start >= 8 {
v = sort::select::partition_at_index(v, start, is_less).2;
}
sort::unstable::sort(v, is_less);
}And to formalize the post-conditions, I think the following should hold after a call to for i in 0..b {
for j in b..n {
assert!(v[i] <= v[j]);
}
}
for i in 0..e {
for j in e..n {
assert!(v[i] <= v[j]);
}
}
for i in b..e {
for j in i..e {
assert!(v[i] <= v[j]);
}
} |
This refers to #149046.